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The performance of T-lepton reconstruction and identification algorithms is studied 
using a data sample of proton-proton collisions at = 7 TeV, corresponding to an 
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1 Introduction 



The primary goal of the Compact Muon Solenoid (CMS) HI experiment is to explore particle 
physics at the TeV energy scale by studying the final states produced in the proton-proton 
collisions at the Large Hadron Collider (LHC) [2J. Leptons play a very important role in these 
studies because they often represent an experimentally favourable signature. 

The three generations of charged leptons, electrons, muons, and taus, are characterized by 
their masses. Because of their higher mass, t leptons play a crucial role in the searches for 
the standard model (SM) Higgs boson, especially for the mass region below twice the W-boson 
mass. The motivation for searches for the Higgs boson in its r-leptonic decays is also supported 
for example by the minimal supersymmetric standard model (MSSM) [3J. Other models of new 
physics, such as sypersymmetric left-right models (SUSYLR), also predict increased couplings 
to the third-generation charged fermions. As a result, the decay chains of the supersymmetric 
particles lead to the lighter stau, which can lead to multi-tau final states [4J. Lepton universality 
ensures that one third of W and Z-boson leptonic decays result in t leptons. When measuring 
rare processes, this contribution becomes substantial. For example, in the search for high-mass 
SM Higgs bosons that decay preferentially into W and Z bosons, the addition of modes with r 
leptons in the final state improves the early discovery potential. 

The lifetime of t leptons is short enough that they decay before reaching the detector ele- 
ments. In two thirds of the cases, t leptons decay hadronically, typically into one or three 
charged mesons (predominantly 7r+, tt^), often accompanied by neutral pions (decaying via 



The CMS collaboration has designed algorithms that use final-state photons and charged had- 
rons to identify hadronic decays of t leptons (t^) through the reconstruction of the intermediate 
resonances. The v-c escapes undetected and is not considered in the reconstruction. These 
algorithms use decay mode identification techniques and efficiently discriminate against po- 
tentially large backgrounds from quarks and gluons that occasionally hadronize into jets of 
low particle multiplicity. The algorithms described here have already been successfully used 
in a measurement of the Z — )■ tt production cross section [|5l and in a search for neutral MSSM 
Higgs bosons decaying into t pairs [6J. 

This paper describes performance studies based on a sample of proton-proton collisions col- 
lected during 2010 at ^/s = 7 TeV, corresponding to an integrated luminosity of 36 pb^^. The 
analysis uses genuine taus from inclusive Z — )■ tt production. One tau is required to de- 
cay leptonically, into a muon, and the other one hadronically, thus creating a fiT^ final state. 
The analysis provides estimates of the Th reconstruction and identification efficiency, and de- 
termines the misidentification rate, the probability for quark and gluon jets or electrons to be 
misidentified as Th. This paper uses the selection requirements that are most commonly used 
in the Z and Higgs analyses, and compares the LHC collision data with predictions based on 
Monte Carlo (MC) simulation. 



A detailed description of CMS can be found elsewhere fl\. The central feature of the CMS 
apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 
3.8 T. Within the field volume are the silicon pixel and strip tracker, the crystal electromagnetic 
calorimeter (ECAL), and the brass /scintillator hadron calorimeter. Muons are measured in 
gas-ionization detectors embedded in the steel return yoke. 




2 CMS Detector 
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3 CMS Th Reconstruction Algorithms 



CMS uses a right-handed coordinate system, with the origin at the nominal interaction point, 
the X axis pointing to the centre of the LHC ring, the y axis pointing up perpendicular to the 
LHC plane, and the z axis along the counterclockwise beam direction. The polar angle is 
measured from the positive z axis and the azimuthal angle (p is measured in the x-y plane. 
Variables used in this article are the pseudorapidity, rj = — ln[tan(0/2)], and the transverse 



The ECAL is designed to have both excellent energy resolution and high granularity, prop- 
erties that are crucial for reconstructing electrons and photons produced in T-lepton decays. 
The ECAL is constructed with projective lead tungstate crystals in two pseudorapidity re- 
gions: the barrel (|//| < 1.479) and the endcap (1.479 < \rj\ < 3). In the barrel region, 
the crystals are 25.8Xo long, where Xq is the radiation length, and provide a granularity of 
A?/ X A(/) = 0.0174 X 0.0174. The endcap region is instrumented with a lead /silicon-strip 
preshower detector consisting of two orthogonal strip detectors with a strip pitch of 1.9 mm. 
One plane is at a depth of 2Xq and the other at SXq. The ECAL has an energy resolution of 
better than 0.5% for unconverted photons with transverse energies above 100 GeV. 

The inner tracker measures charged particle tracks within the range \rj\ < 2.5. It consists of 
1 440 silicon pixel and 15 148 silicon strip detector modules, and provides an impact parameter 
resolution of ~ 15 }im and a transverse momentum resolution of about 1.5% for 100 GeV par- 
ticles. The reconstructed tracks are used to measure the location of interaction vertex(es). The 
spatial resolution of the reconstruction is ^ 25}im for vertexes with more than 30 associated 
tracks IZl. 

The muon barrel region is covered by drift tubes, and the endcap regions by cathode strip 
chambers. In both regions, resistive plate chambers provide additional coordinate and timing 
information. Muons can be reconstructed in the range < 2.4, with a typical pj resolution of 



CMS has developed two algorithms for identifying tk decays, based on the categorization of 
the Th-decay channels through the reconstruction of intermediate resonances: the hadron plus 
strips (HPS) and the tau neural classifier (TaNC) algorithms. The HPS algorithm is used as 
the main algorithm in most previous CMS t analyses, with TaNC used for crosschecks. Both 
algorithms use particle flow (PF |8|) particles. In the PF approach, information from all sub- 
detectors is combined to reconstruct and identify all particles produced in the collision. The 
particles are classified into mutually exclusive categories: charged hadrons, photons, neutral 
hadrons, muons, and electrons. These algorithms are designed to optimize the performance of 
the Th identification and reconstruction by considering the different hadronic decay modes of 
the tau individually. The dominant hadronic decays of t leptons consist of one or three charged 
n mesons and up to two mesons, as summarized in Table [l] 

Both algorithms start the reconstruction of a candidate from a PF jet, whose four-momentum 
is reconstructed using the anti-Zcj algorithm with a distance parameter R = 0.5 [lOJ. Using a 
PF jet as an initial seed, the algorithms first reconstruct the components of the t^, then 
combine them with charged hadrons to reconstruct the tau decay mode and calculate the tau 
four-momentum and isolation quantities. 




l%for pT~40GeV/c. 
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3.1 HPS Algorithm 
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Table 1: Branching fractions of the dominant hadronic decays of the t lepton and the sym- 
bol and mass of any intermediate resonance HI. The h stands for both n and K, but in this 
analysis the n mass is assigned to all charged particles. The table is symmetric under charge 
conjugation. 



Decay mode 


Resonance 


Mass (MeV/c^) 


Branching fraction (%) 








11.6% 




P~ 


770 


26.0% 




H 


1200 


9.5% 


T~ — > h~h^h~v-[ 




1200 


9.8% 






4.8% 



3.1 HPS Algorithm 

The HPS algorithm gives special attention to photon conversions in the CMS tracker material. 
The bending of electron/ positron tracks in the magnetic field of the CMS solenoid broadens 
the calorimeter signatures of neutral pions in the azimuthal direction. This effect is taken into 
account in the HPS algorithm by reconstructing photons in "strips", objects that are built out of 
electromagnetic particles (PF photons and electrons). The strip reconstruction starts by center- 
ing a strip on the most energetic electromagnetic particle within the PF jet. The algorithm then 
searches for other electromagnetic particles within a window of size A?/ = 0.05 and Ai^ = 0.20 
centered on the strip center. If other electromagnetic particles are found within that window, 
the most energetic one gets associated with the strip and the strip four-momentum is recalcu- 
lated. The procedure is repeated until no further particles are found that can be associated with 
the strip. Strips satisfying a minimum transverse momentum requirement of Pj"^ > 1 GeV/c 
are finally combined with the charged hadrons to reconstruct individual decay modes. 

The decay topologies that are considered by the HPS tau identification algorithm are 

1. Single hadron corresponds to /j~ Vt and n^v-c decays in which the neutral pions have too 
little energy to be reconstructed as strips. 

2. One hadron + one strip reconstructs the decay mode n^v^ in events in which the photons 
from decay are close together on the calorimeter surface. 

3. One hadron + two strips corresponds to the decay mode h^ rfiv^ in events in which photons 
from decays are well separated. 

4. Three hadrons corresponds to the decay mode h^h^h^v-j^. The three charged hadrons are 
required to come from the same secondary vertex. 

There are no separate decay topologies for the h^rfirfi and h^h^h^ n^Vj decay modes. They 
are reconstructed via the existing topologies. All charged hadrons and strips are required to 
be contained within a cone of size AR = {2.8GeV/c) / p^, where p^ is the transverse mo- 
mentum of the reconstructed t^. The reconstructed tau momentum jg required to match 
the (//, (p) direction of the original PF jet within a maximum distance of AR = 0.1, where 
AR = 7(A//)2 + (A</))2. 

The four-momenta of charged hadrons and strips are reconstructed according to the respective 
Th decay topology hypothesis, assuming all charged hadrons to be pions, and are required 
to be consistent with the masses of the intermediate meson resonances listed in Table [T] The 
following invariant mass windows are allowed for candidates: 50 - 200MeV/c^ for n^, 0.3 - 
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3 CMS Th Reconstruction Algorithms 



1.3GeV/c^ for p, and 0.8 - 1.5GeV/c^ for a-[. In cases where a Th decay is consistent with more 
than one hypothesis, the hypothesis giving the highest is chosen. 

Finally, reconstructed candidates are required to be isolated. The isolation criterion requires 
that, apart from the Th decay products, there be no charged hadrons or photons present within 
an isolation cone of size AR = 0.5 around the direction of the Th. By adjusting the px threshold 
for particles that are considered in the isolation cone, three working points, "loose", "medium", 
and "tight" are defined. The working points are determined using a simulated sample of QCD 
dijet events. The "loose" working point corresponds to a probability of approximately 1% for 
jets to be misidentified as Th. Successive working points reduce the misidentification rate by a 
factor of two with respect to the previous one. 

3.2 TaNC Algorithm 

In the TaNC case the leading (highest-px) particle is required to have a pj above 5 GeV/ c and 
to be within AR = 0.1 around the jet direction. The PF Th four-momentum is reconstructed as a 
sum of the four-momenta of all particles with pj above 0.5 GeV/c in a cone of radius AR = 0.15 
around the direction of the leading particle. A signal cone size is defined to be ARphotons = 
0.15 for photons and AR^harged = (5 GeV) /Ex for charged hadrons, where Ej is the transverse 
energy of the PF Th, and AR^harged is restricted to be within the range 0.07 < AR^harged < 0-15. 
The signal cone is the region where the Th decay products are expected to be found. An isolation 
annulus is defined between the signal cone and a wider isolation cone of outer radius AR = 0.5 
around the leading particle. 

The decay mode is reconstructed from the particles that are contained within the signal cone 
of the Th candidate by counting the number of tracks and meson candidates. The meson 
candidates are reconstructed by merging pairs of photons that have an invariant mass of less 
than 0.2GeV/c^. All unpaired photons are considered as candidates if their pj exceeds 10% 
of the PF Th transverse momentum. 

The decay mode of each Th candidate is uniquely determined by the multiplicity of recon- 
structed objects in the signal cone. Candidates with decay topologies other than those listed in 
Table [l] are immediately rejected. Otherwise, a neural network is used to compute a discrimi- 
nant quantity for the Th candidate. Each decay mode of Table[l]uses a different neural network. 
The input observables used for each neural network are optimized for the topology of the decay 
mode, and are constructed from the four-momenta of the particles in the signal cone and the 
isolation annulus. In general, the signal cone input observables are chosen to parameterize the 
decay kinematics of the intermediate resonance, and the isolation cone observables to describe 
the multiplicity and pj spectrum of nearby particles. The variables include angular correla- 
tions between different particles within the signal and the isolation cones, invariant masses 
calculated using different combinations of the particles, transverse momenta, and numbers of 
charged particles in the signal and the isolation regions. The neural networks are trained to 
discriminate between genuine Th produced in Z — > tt decays and misidentified jets from a 
sample of QCD multijet events. The set of input observables for a given neural network is 
chosen to be the minimal set of observables for which the removal of any two input variables 
significantly degrades the classification performance. 

The output of the neural network is a continuous quantity. By adjusting the thresholds of 
selections on the neural network output, three working points, again called "loose", "medium". 



and "tight", are defined, similar to those discussed in Section 3.1 
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4 Efficiency of th Reconstruction and Identification 

To compare the performance of reconstruction in data and MC simulation, a set of MC sam- 
ples is used to reproduce a mixture of signal and background events. The signal is expected to 
come from inclusive Z — )■ tt production. The major sources of background are rr Drell-Yan 
production outside of the Z-mass region, W production with associated jets, QCD multijet, and 
tt production. The Drell-Yan signal and background are simulated with the next-to-leading 
order (NLO) MC generator POWHEG IITTtiTSll . The QCD multijet and W backgrounds are sim- 
ulated with PYTHIA [T5| and the top quark samples with Madgraph [15J. The T-lepton decays 
are simulated with Tauola [16J. The samples are normalized using the cross section at next- to- 
next-to-leading order (NNLO) for Drell-Yan and W, at leading order (LO) for QCD, and NLO 
for the tt sample. The MC samples are mixed based on the corresponding cross sections. 

To measure the efficiency of Th reconstruction and identification in data, a tag-and-probe method 
is used with a sample of Z — ?> tt — ?■ f<Th events. The events are preselected using kinematic 
cuts and a set of requirements to suppress the background from Z — > mi, W, and QCD events, 
but without applying the Th-identification algorithms. The preselection requires the event to 
be triggered by a single-muon high level trigger (171, arid to contain only one isolated muon 
with pj > 15GeV/c within the geometric acceptance \t]^\ < 2.1, that is used as a tag. An 

isolated jet candidate of p^' > 20GeV/c within the geometric acceptance |f/jet| < 2.3, with a 
"leading" (highest-px) track constituent in the jet with pj > 5GeV/c, is used as a probe. The 
preselection is needed to increase the percentage of Z — )■ tt events in the final sample. This 
preselection clearly biases the sample, but the bias is taken into account when computing the 
final efficiency. The four-momentum of the jet is reconstructed using the anti-Zcx algorithm with 
a distance parameter of 0.5 [10 |. The muon and the "leading" track in the jet are required to be 
of opposite charge. To suppress background from W+jet(s) events, an additional requirement 
on transverse mass, Mj, of the muon and missing transverse energy, £^'®®, of less than 40 GeV 

is applied. The transverse mass is defined as Mj = ^JlpjE™^^^ ■ (1 — cos Ai/)), where pj is the 

muon transverse momentum and is the azimuthal angle between the £^®* vector and Pj. 

The HPS and TaNC algorithms are both applied to the preselected events. The resulting invari- 
ant mass distributions of the ?<-jet system for those events that pass or fail the t^ identification 
are fitted using signal and background distributions provided by MC simulation. The effi- 
ciency is then calculated as £ = Np^J'^ / {Np^J'^ + N^^^'^'^), where Np^j^'^^.j are the numbers of 
Z — > TT events after background contributions are subtracted. Figure [l] shows the invariant 
mass of the f/-jet system for preselected events that pass (left) and fail (right) the "loose" Th 
identification requirements. Since in the "failed" sample there is no t^ reconstructed, for con- 
sistency the visible mass is always computed using the jet four-vector and not the four-vector 
as reconstructed by the Th algorithms. The MC predictions for signal and background events 
are also shown. The "passed" sample is dominated by Z events and a small background con- 
tribution. The sample of "failed" events is dominated by background contributions. The MC 
predictions describe the data reasonably well. The stability of the fit results is tested by using 
background estimates from data instead of the MC predictions and by varying the invariant 
mass ranges for the fit. All checks demonstrate consistent results within the uncertainties of 
the method. 

Results of the fits are summarized in Table |2] The values measured in data, "Fit data", are com- 
pared with the expected values, "Expected MC", obtained by repeating the fitting procedure 
on simulated events. The efficiency of the Th algorithms on preselected events is approximately 
30% higher than for an inclusive sample, without preselection. In general the value of the ef- 
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4 Efficiency of th Reconstruction and Identification 



CMS,\/s = 7 TeV, 36 pb"' CMS,\/s = 7 TeV, 36 pb" 




|j.-jet visible mass (GeV/c^) |a.-jet visible mass (GeV/c^) 



Figure 1: Invariant mass distribution of the ^-jet system for preselected events which pass (left) 
and fail (right) the HPS "loose" Th identification requirements (solid symbols) compared to 
predictions of the MC simulation (histograms). 



Table 2: Efficiency for a tj^ to pass the HPS and TaNC identification criteria, measured by fitting 
the Z — )■ TT signal contribution in the samples of the "passed" and "failed" preselected events. 
The uncertainties of the fit are statistical only. The statistical uncertainties of the MC predictions 
are small and can be neglected. The last column represents the data-to-MC correction factors 
and their full uncertainties including statistical and systematic components. Data-to-MC ratios 
for the Th reconstruction efficiency determined using fits to the measured Z production cross 
sections as described in [5] are also shown. 



Algorithm 


Fit data 


Expected MC 


Data/MC 


HPS "loose" 
HPS "medium" 
HPS "tight" 


0.70 ±0.15 
0.53 ± 0.13 
0.33 ± 0.08 


0.70 
0.53 
0.36 


1.00 ±0.24 

1.01 ±0.26 
0.93 ± 0.25 


TaNC "loose" 
TaNC "medium" 
TaNC "tight" 


0.76 ± 0.20 
0.63 ±0.17 
0.55 ±0.15 


0.72 
0.66 
0.55 


1.06 ±0.30 
0.96 ±0.27 
1.00 ±0.28 


HPS "loose" 


TT combined fit [5] 


0.94 ± 0.09 


HPS "loose" 


TT to ee fit L5J 


0.96 ± 0.07 
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ficiency depends on the pj and rj requirements, which are applied in each individual physics 
analysis. The main goal of this study is to perform the data-to-MC comparison and to deter- 
mine data-to-MC correction factors and their uncertainties. The agreement in the mean values 
of the fits between data and MC simulation is observed to be better than a few percent, although 
with this data sample, the statistical uncertainties of the fits are in the range of 20-30%. 

Systematic uncertainties on the measured identification efficiencies arise from uncertain- 
ties on track reconstruction (4%) and from uncertainties on the probabilities for jets to pass 
the "leading" track pj and loose isolation requirements applied in the preselection (< 12%). 
Uncertainties on track momentum and energy scales have an effect on the measured 
identification efficiencies below 1%. All numbers represent relative uncertainties. 

The resulting ratio of the measured efficiencies to those predicted by MC simulation for de- 
cays to pass the "loose", "medium", and "tight" HPS and TaNC working points are presented 
in the last column of Table |2] The uncertainties on the ratios represent the full uncertainties 
of the method, which are calculated by adding the statistical and systematic uncertainties in 
quadrature. The total uncertainty of the measured efficiency of the Tk algorithms is dominated 
by the statistical uncertainty of the fit. The simulation describes the data well. Since the same 
event sample is used to evaluate efficiencies for different working points, the results are corre- 
lated. 

The values presented in Table |2] are used as inputs for fits to measure the uncertainty of the 
Th reconstruction and identification efficiency with higher precision by comparing the yield of 
the Z — ?► TT events in different decay modes and the yield of Z — ?► and Z — ?► ee events, 
as described elsewhere tSJ. The first approach uses a simultaneous fit of the four Z — > tt 
decay channels with final states e}i, fir^, and eT^. As a result of the fit, the combined cross 
section and Th efficiency are measured. The data-to-MC correction factor for the HPS "loose" 
working point is measured to be 0.94 ± 0.09. The second approach is based on a comparison 
of the Th channels, Z fir^ and er^, to the combined Z — > ee cross section as measured 
by CMS. The data-to-MC correction factor for the HPS "loose" working point in this case is 
measured to be 0.96 ± 0.07. The slightly smaller uncertainty of the latter method is explained 
by the higher precision of the combined Z — > fiji, ee cross-section measurement. These values 
are also presented in Table |2j Both approaches yield more precise uncertainties, 9% and 7%, 
than the 24% from the tag-and-probe method, for the "loose" HPS working point. To achieve 
this precision, the methods rely on assumptions about the physics source of the signal, i.e., the 
values of the inclusive Z production cross section and Z — > tt branching fraction, and the 
absence of non-SM sources in the data sample. In physics analyses where these assumptions 
cannot be made, such as the measurement of the Z — > tt production cross section itself [5J and 
the search for H — > tt [6J, the tag-and-probe method remains the only one available. 

The expected Th efficiency values from the Z ^ tt process, with a reconstructed \}]t^,\ < 2.3, 
and either pj'' > 15GeV/c or pj'' > 20GeV/c, are estimated using simulated events and pre- 
sented in Table |3] The selections are applied both at the generated and reconstructed levels. A 
matching of AR < 0.15 between the generated and reconstructed Th directions is required. Fig- 
ure [2] shows the expected efficiencies as a function of the generated p^ for all working points 
of each algorithm. 

5 Reconstruction of the Th Decay Mode 

The correlation between the generated and reconstructed Th decay modes is studied using a 
sample of simulated Z — )■ tt events. The results are presented in Fig. |3] (left). Each column 
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5 Reconstruction of the th Decay Mode 



Table 3: The expected efficiency for th decays to pass the HPS and TaNC identification criteria 
estimated using Z — )■ tt events from the MC simulation for two different selection require- 
ments on p^. The requirement is applied both at the reconstruction and generator levels. The 
statistical uncertainties of the MC predictions are smaller than the least significant digit of the 
efficiency values in the table and are not shown. 



Algorithm 


HPS 


TaNC 




"loose" 


"medium" 


"tight" 


"loose" 


"medium" 


"tight" 


Efficiency (pj*" > 15GeV/c) 
Efficiency (p^h > 20GeV/c) 


0.46 
0.50 


0.34 
0.37 


0.23 
0.25 


0.54 
0.58 


0.43 
0.48 


0.30 
0.36 
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Figure 2: The expected efficiency of the algorithms as a function of generated p^, estimated 
using a sample of simulated Z — > tt events for the HPS (left) and TaNC (right) algorithms, for 
the "loose", "medium", and "tight" working points. 
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CMS Simulation, = 7 TeV 
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Figure 3: (left) The fraction of generated decays of a given type reconstructed in a certain 
decay mode for the HPS "loose" working point from simulated Z — )■ tt events, (right) The 
relative yield of Th reconstructed in different decay modes in the Z — > tt — > f/T^ data sample 
compared to the MC predictions. The MC simulation is a mixture of the signal and background 
samples based on the corresponding cross sections, as shown by the histograms. 



represents one generated decay mode normalized to unity. Each row corresponds to one recon- 
structed decay mode. The numbers demonstrate the fraction of generated Th of a given type 
reconstructed in a specific decay mode. Both generated and reconstructed t^ are required to 
have a visible transverse momentum > 15 GeV/c, and to match within a cone of AR = 0.15. 
For each of the generated decay modes, the fraction of correctly reconstructed decays is more 
than 80%, reaching 90% for the three-charged-pion decay mode. 

A data-to-MC comparison of the relative yield of events reconstructed in different t^ decay 
modes in a data sample of Z — ?> tt — ?> events is shown in Fig. |3] (right). The events are 
selected using the requirements described in 0. The t^ candidates are required to have visible 
transverse momenta > 20 GeV/c within the geometric acceptance |?/ 1 < 2.3. The MC sample 
represents a mixture of the signal and background MC samples based on the corresponding 
cross sections. The performance of the Th algorithm is well reproduced by the MC simulation. 

6 Reconstruction of the th Energy 

Since charged hadrons and photons are reconstructed with high precision using the PF tech- 
niques, the reconstructed Th energy is expected to be close to the true energy of its visible decay 
products. According to simulation, the ratio of the reconstructed to the true visible Th energy 
for the HPS algorithm is constant as a function of energy and within 2% of unity, while for TaNC 
it decreases by about 2% as approaches 60GeV/c. The rj dependence is more pronounced. 
For both algorithms the reconstructed Th energy is underestimated by 5% with respect to the 
true energy as one moves towards higher i] (from barrel to endcap region). 

The quality of the Th energy scale simulation can be examined by analyzing the Z — > tt — ?► /^Th 
data sample. The reconstructed invariant mass of the ^Th system is very sensitive to the energy 
scale of the Th, since the muon four-momenta are measured with high precision. By varying the 
Th energy scale simultaneously in the signal and background MC samples, a set of templates is 
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7 Measurement of the th Misidentification Rate for Jets 
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Figure 4: The reconstructed invariant mass of decaying into one charged and one neutral 
pion (left) and into three charged pions (right) from data, compared to predictions of the simu- 
lation. The solid lines represent results of the best fit described in the text and the dashed lines 
represent the predictions with the tau energy scale, TauES, varied up and down by 3% with 
respect to the best fit value. 



produced. The resulting templates are fitted to the data and the best agreement is achieved by 
scaling the Th energy in simulation by a factor 0.97 ± 0.03, where the uncertainty is averaged 
over the pseudorapidity range of the data sample. 

A complementary procedure, which does not assume knowledge of the tt invariant mass spec- 
trum, is based on the invariant mass of reconstructed Th constituents, shown in Fig. |4] The 
method uses Th as an independent object but relies on good understanding of underlying back- 
ground events that contribute to the signal sample. The fit is performed separately for nn^ and 
nnn decay channels, since the major source of the uncertainty is expected to come from re- 
construction of the electromagnetic energy. The simulation describes both decay channels well. 
The best agreement is achieved by scaling the Th energy in simulation by a factor 0.97 ± 0.03 
for the nn^ decay mode and by a factor 1.01 ± 0.02 for the nnn decay mode. The effect of 
the energy-scale uncertainty on the shape of the Th invariant mass distribution is also shown in 
Fig. |4] Varying the energy scale in simulation by the uncertainty derived from the jiT^ invariant 
mass fit, i.e. 3%, corresponds to a significant deviation in the predicted Th mass shape. 

7 Measurement of the Th Misidentification Rate for Jets 

Jets that could be misidentified as Th have different properties depending on their origin. Most 
of the jets are produced in QCD processes, either with or without the associated production of Z 
or W bosons. To distinguish between them, different data samples are selected. The QCD-type, 
gluon-enriched, jets are selected using events with at least one jet of transverse momentum 
Pj' > 15GeV/c and a second jet of > lOGeV/c, both within \r]\ < 2.5. The Z- and W-type, 
quark-enriched, jets are selected by requiring at least one isolated muon with transverse mo- 
mentum fj > 15 GeV/c and \rj\ < 2.1 and a jet of transverse momentum pj' > 10 GeV/c within 
< 2.5. In addition, a muon-enriched QCD sample is selected by requiring a muon and a jet, 
but suppressing the W contribution by selecting events with Mj < 40 GeV/c^. For each of these 
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samples additional selection requirements are applied to suppress the background contribution 
from events with jets from other sources. 

Figure |5] shows the Th misidentification rate as a function of the jet pj for the "loose" working 
points of the HPS and TaNC algorithms, where the measured values are compared with the MC 
predictions for the different types of jets. The misidentification rates expected from simulation, 
and the measured data-to-MC ratios are summarized in Table |4] for the three working points 
of both reconstruction algorithms. The values are integrated over the pj and tj phase space 
used in the Z — ?► tt analysis, p^' > 20GeV/c and \f]\ < 2.3. The misidentification rate as 
a function of reconstruction efficiency for all working points of both algorithms is shown in 
Fig. |6| which summarizes the MC estimated efficiency and the measured misidentification rate 
values presented in Tables |3]and |4] Since the QCD and ^/-enriched QCD misidentification rate 
values are observed to be similar, only one set of QCD points is shown. Open symbols represent 
results obtained by running an early fixed-cone Th -identification algorithm, used in the CMS 
physics technical design report (PTDR, ITU) on simulated events. The decay-mode-based HPS 
and TaNC algorithms perform significantly better than the fixed-cone algorithm. 
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Figure 5: Misidentification probabilities for jets to pass "loose" working points of the HPS 
(left) and TaNC (right) algorithms as a function of jet px for QCD, /^-enriched QCD, and W 
type events. The misidentification rates measured in data are shown by solid symbols and 
compared to MC prediction, displayed with open symbols. 



8 Measurement of the Th Misidentification Rate for Electrons 

Isolated electrons passing the identification and isolation criteria of the Th algorithms are also 
an important source of background in many analyses with Th in the final state. In this case 
the electron is misidentified as a pion originating from Th. A multivariate discriminant is used 
to reduce this background, improving the separation between pions and electrons. The dis- 
criminant is implemented in the PF algorithm and its output is denoted by ^. The value of the 
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Figure 6: The measured misidentification rate as a function of the MC-estimated re- 
construction efficiency for the three working points of the HPS and TaNC algorithms from 
/^-enriched QCD and W data samples. For each algorithm the "loose", "medium", and "tight" 
selections are the points with highest, middle and lowest efficiencies respectively. The PTDR 
points represent results of the fixed-cone Th-identification algorithm 1181 on simulation. 



Table 4: The MC predicted misidentification rates and the measured data-to-MC ratios, in- 
tegrated over the pj and rj phase space typical for the Z — > tt analysis. 



Algorithm 


QCD 


QCD^ 


W + jets 




MC (%) 


Data/MC 


MC (%) 


Data/MC 


MC (%) 


Data/MC 


HPS "loose" 


1.0 


1.00 ±0.04 


1.0 


1.07 ±0.01 


1.5 


0.99 ± 0.04 


HPS "medium" 


0.4 


1.02 ±0.06 


0.4 


1.05 ±0.02 


0.6 


1.04 ±0.06 


HPS "tight" 


0.2 


0.94 ±0.09 


0.2 


1.06 ±0.02 


0.3 


1.08 ±0.09 


TaNC "loose" 


2.1 


1.05 ±0.04 


1.9 


1.12 ±0.01 


3.0 


1.02 ±0.05 


TaNC "medium" 


1.3 


1.05 ± 0.05 


0.9 


1.08 ±0.02 


1.6 


0.98 ± 0.07 


TaNC "tight" 


0.5 


0.98 ± 0.07 


0.4 


1.06 ± 0.02 


0.8 


0.95 ± 0.09 
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discriminant ^ ranges between —1.0 (most compatible with the pion hypothesis) and 1.0 (most 
compatible with the electron hypothesis). 

Two selected working points, corresponding to ^ < —0.1 and ^ < 0.6, are considered in this 
analysis. The first working point rejects even those electrons, that are poorly reconstructed 
and is optimized for a low misidentification rate, about 2%, at the price of about 4% losses 
of genuine Th. The second working point suffers from larger misidentification rates of about 
20%, since it was optimized for efficiencies exceeding 99.5%. It rejects only well identified 
electrons. 

The probability for an electron to be misidentified as Tj^, the e — > misidentification rate, is 
determined using a sample of isolated electrons coming from the decay Z — > ee. The events 
are required to have a reconstructed electron and an electron that is reconstructed as Th- The 
particles must have opposite charge. The invariant mass of the pair is required to be between 
60 and 120GeV/c^. The tag electron is required to be isolated and to have a in excess of 
25GeV/c. The second electron, a probe, is required to pass the HPS "loose" working point, 
without requiring any specific veto against electrons, and have pj in excess of 15 GeV/ c. The 
e — > Th misidentification rate is estimated by measuring the ratio between the number of probes 
passing the electron-rejection discriminant and the overall number of selected probes. The 
sample of events that does not pass the electron-rejection discriminant, is populated by well- 
reconstructed electrons. The sample that passes the discriminant contains poorly reconstructed 
electrons, as well as other background contributions, "misidentified electrons". To remove the 
contamination from misidentified electrons, a background subtraction procedure is performed 
by fitting the passing and failing eTh invariant mass distributions to the superposition of signal 
and background components. 

Table |5] gives the ratio between the misidentification rates as measured in the data and those 
obtained using MC simulation for two \r] \ bins. In the central r] region, the simulation underes- 
timates the measured misidentification rates. Within the uncertainties of the measurement the 
data-to-MC ratios for both discriminants agree in the same r] intervals. 

Table 5: The e — ?> misidentification rates, found by applying the tag-and-probe method to the 
MC simulation and the ratio of the tag-and-probe values obtained in data and MC simulation, 
shown in two regions of t] and for two working points of the electron-rejection discriminant. 



Bin 


Discriminant ^ < —0.1 


Discriminant ^ < 0.6 




MC (%) 


Data/MC 


MC (%) 


Data/MC 


< 1.5 


2.21 ± 0.05 


1.13 ±0.17 


13.10 ±0.08 


1.14 ±0.04 


> 1.5 


3.96 ± 0.09 


0.82 ±0.18 


26.80 ±0.16 


0.90 ± 0.04 



9 Summary 

The performances of two reconstruction algorithms for hadronic tau decays developed by 
CMS, HPS and TaNC, have been studied using the data sample collected at a centre-of-mass 
energy of 7 TeV in 2010 and corresponding to an integrated luminosity of 36 pb^^. Both al- 
gorithms show good performance in terms of T;, identification efficiency, approximately 50%, 
while keeping the misidentification rate for jets at the level of ~1%. The MC simulation was 
found to describe the data well. The identification efficiency was measured with an uncer- 
tainty of 24% by using a tag-and-probe method in a Z — > tt — ?► piTh data sample, and with an 
uncertainty of 7% by using a global fit to all Z — > tt decay channels and constraining the yield 
to the measured combined Z — )■ fifi, ee cross section. The scale factor for measured energies 
was found to be close to unity with a relative uncertainty less than 3%. 
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